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Abstract 

Wc  develop  good  heuristics  to  schedule  tasks  on  supercomputers.  Supercom¬ 
puters  comprised  of  multiple  pipelines  as  well  as  those  comprised  of  asynchro¬ 
nous  multiple  processors  are  considered.  In  addition,  we  consider  the  case 
when  different  pipes  or  processors  run  at  different  speeds. 
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1.  Introduction 

A  block  diagram  for  a  multiple  pipeline  vector  supercomputer  ([6])  is  given  in 
Figure  1.  Instruction  fetches  and  decodes  are  carried  out  by  the  instruction 
processing  unit.  Scalar  instructions  are  sent  to  the  scalar  processor  while  vec¬ 
tor  instructions  are  sent  to  the  vector  controller.  The  vector  controller  receives 
vector  instructions  from  the  instruction  processing  unit.  These  instructions  are 
set  up  on  the  vec  access  controller,  buffer  and  pipeline.  Data  is  brought  to 
and  from  the  pipelines  by  the  vector  access  controller  via  the  vector  buffer.  The 
vector  buffer-  is  essentialy  a  cache  that  is  used  to  close  the  gap  between  memory 
access  speed  and  vector  pipeline  speed.  The  vector  pipeline  actually  consists  of 
several  (say  m)  independent  pipelines.  Each  pipeline  is  capable  of  executing 
every  instruction  (though  during  a  single  vector  instruction  the  instruction  exe¬ 
cuted  does  not  change)  and  the  vector  controller  is  capable  of  scheduling 
several  vector  instructions  simultaneously. 


figure  1  Block  diagram  of  a  multi  pipeline  vector  supercomputer. 


The  pipelines  constituting  the  vector  pipeline  may  be  identical  or  uniform. 
Thus  with  pipeline  i  we  may  associate  a  speed  s<,  lsism.  When  all  the  s<s  are 
the  same,  we  say  that  the  pipelines  are  identical.  The  speed  of  a  pipeline  is 
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measured  relative  to  that  of  a  unit  pipeline  which  by  definition  has  a  speed  of  1. 

There  arc  three  aspects  to  executing  a  vector  task  on  a  pipeline.  First, 
there  is  the  time  needed  to  set  up  the  instruction  and  get  the  first  operand  pair 
to  the  pipeline.  This  is  the  start  up  time.  Next,  there  is  the  time  needed  to  per¬ 
form  the  instruction  on  an  operand  pair  and  bring  in  the  next  operand  pair. 
This  is  the  latency  time.  Finally,  there  is  the  flush  time.  This  is  the  time  needed 
to  perform  the  instruction  on  the  last  operand  pair  and  move  the  results  out  of 
the  pipeline.  In  this  paper,  we  shall  make  the  simplifying  assumption  that  the 
start  up,  latency,  and  flush  times  on  a  unit  pipeline  are  the  same  for  every  vec¬ 
tor  instruction.  Let  1 0’  denote  the  sum  of  the  start  up  and  flush  times  and  let  tL 
denote  the  latency  time.  The  total  time,  t.  needed  (called  the  processing  time) 
by  a  unit  pipeline  to  run  a  vector  instruction  on  a  vector  of  length  L  is  given  by 
the  equation  [6]: 

t  =  t0'  +  tt  *{L  -  1) 

-  (*o'-0  +  hi) 

=  t0  +  tiL 


where  to  =  to'  —  f j  is  called  the  overhead  time. 

For  a  typical  unit  pipeline,  t0  will  be  much  larger  than  tt.  A  pipeline  with 
speed  s  can  in  6  time  perform  s*<5  units  of  processing.  Thus  if  a  task  needs  t 
units  of  processing  on  a  unit  pipeline,  it  can  be  completed  in  t/s  time  units  on  a 
pipeline  of  speed  s. 

Lot  us  assume  that  a  set  of  n  tasks  is  to  be  scheduled  on  the  m  pipelines.  In 
general  there  will  be  a  precedence  relation  associated  with  the  task  set.  How¬ 
ever,  in  this  paper  we  shall  consider  only  the  case  when  this  relation  is  null.  I.e., 
the  tasks  are  independent.  Let  L*  be  the  length  of  the  vector  task  i  and  let  t4  = 
fj  *  L».  We  shall  require  that  f<  >  0.  A  unit  pipeline  will  require  *o  +  k  time  to 
complete  task  i. 

A  schedule  is  an  assignment  of  tasks  (or  portions  of  tasks)  to  time  slots  on 
the  pipelines  such  that: 
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1.  No  pipeline  executes  more  than  one  task  at  any  given  time. 

2.  No  task  is  being  executed  on  more  than  one  pipeline  at  any  time. 

3.  Ail  tasks  are  completed  by  the  end  of  the  schedule.  Note  that  tasks  may  be 
scheduled  preemptively  and  that  every  time  a  task  is  started,  the  overhead 
penalty  of  t0  units  of  processing  is  incurred.  Consequently,  tasks  (or  por¬ 
tions  thereof)  must  not  be  scheduled  for  time  slots  of  size  less  than  t0  /  si 
on  processor  i.  The  scheduled  slot  should  actually  be  larger  than  this  if  any 
useful  work  is  to  be  performed. 

The  length  of  a  schedule  is  the  earliest  time  by  which  all  the  pipelines  have 
completed  the  work  assigned  to  them.  In  a  nanpre eruptive  schedule,  a  task  is 
executed  continuously  from  start  to  finish  on  the  same  pipeline.  A  task  is  said 
to  be  scheduled  preemptively  if  it  is  assigned  to  two  or  more  noncontiguous  time 
slots  on  the  same  pipeline  or  is  assigned  for  processing  to  two  or  more  pipelines. 
Throughout  this  paper,  we  assume  that  the  number  of  tasks,  n,  to  be  scheduled 
is  no  less  than  the  number  of  pipelines,  m,  available. 

The  advantages  to  be  reaped  from  preemptive  schedules  can  be  seen  from  a 
simple  example.  Let  n  =  3,  m  =  2  ,  st  =  sg  =  1,  t0  =  1,  f  j  =  te  =  ta  =  100.  If  no 
preemptions  are  used,  it  takes  the  2  pipelines  202  time  units  to  complete  the  3 
tasks  (Figure  2a).  On  the  other  hand,  by  using  preemptions,  the  3  tasks  can  be 
completed  in  152  time  units  (figure  2b).  Ihe  shaded  area  in  each  figure  indi¬ 
cates  the  overhead  time  of  t0. 


We  are  interested,  In  this  paper,  in  developing  algorithms  to  schedule  task 
sets  so  as  to  minimize  the  schedule  length.  Before  discussing  work  previously 
done  on  this  problem,  we  Introduce  another  supercomputer  model  for  which  this 
scheduling  problem  is  of  interest.  Figure  3  gives  the  block  diagram  for  a  super¬ 
computer  comprised  of  m  asynchronous  and  independent  processors.  Each  pro¬ 
cessor  starts  with  Its  schedule  of  tasks  (and  subtasks)  and  repeatedly  performs 
the  following  steps: 

1.  Set  up  the  next  task  (or  subtask)  to  be  performed.  This  will  involve  getting 
the  program  and  data  for  this  task  from  the  common  memory  and  transfer- 
lng  It  to  the  local  memory. 
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(a)  No  Preemptions  (b)  Preemptions 

figure  2  Example  schedules 

2.  Execute  the  task  for  the  specified  duration. 

3.  Flush  the  processor.  This  wculd  involve  moving  the  results  of  the  compu- 
tion  back  to  the  common  memory. 


Figure  3  Block  diagram  of  a  supercomputer  with  asynchronous  processors. 
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As  in  the  case  of  a  vector  pipeline  supercomputer,  the  m  independent  pro¬ 
cessors  may  or  may  not  be  identical.  In  general,  there  will  be  a  speed  and  a 
local  memory  size  p*  associated  with  processor  i.  Task  i  will  require  a  total  of  t* 
units  of  processing  (excluding  overhead)  on  a  unit  processor  (L  e.,  a  processor 
whose  speed  is  1).  In  addition,  task  i  will  require  U*  units  of  local  memory  to 
run.  Hence,  this  task  (or  portions  of  it)  can  be  run  only  on  those  processors 
that  have  at  least  tq  units  of  local  memory.  Once  again,  we  are  interested  in 
constructing  schedules  that  have  minimum  length.  We  make  the  simplifying 
assumption  that  the  common  memory  is  sufficiently  interleaved  that  all  proces¬ 
sors  can  do  their  set  up  and  flush  simultaneously. 

It  is  not  too  difficult  to  see  that  the  problem  of  constructing  minimal  length 
preemptive  schedules  for  an  m  pipeline  vector  supercomputer  is  identical  to 
that  of  constructing  such  schedules  for  a  supercomputer  that  has  m  asynchro¬ 
nous  processors  all  of  which  have  the  same  amount  of  local  memory.  So,  in 
future  discussion  we  shall  explicitly  refer  only  to  the  m  asynchronous  processor 
case.  All  our  results  trivially  carry  over  to  the  case  of  m  pipelines. 

It  is  well  known  that  constructing  minimum  length  preemptive  schedules  is 
NP-hard  even  when  there  are  only  2  identical  processors  with  equal  memory  size 
([4]).  When  the  start  up  and  flush  time  is  zero  (i.e.,  t0  =  0).  optimal  schedules 
may  be  constructed  efficiently.  McNaughton  [10]  has  developed  an  0(n)  algo¬ 
rithm  for  the  case  when  all  processors  have  the  same  speed  as  well  as  the  same 
memory  capacities.  The  algorithm  developed  by  Kafura  and  Shen  [5]  for  the 
case  when  all  processors  have  the  same  speed  but  have  different  memory  sizes 
can  be  easily  implemented  to  run  in  O(nlogm)  time.  Gonzalez  and  Sahni  [3]  have 
developed  an  0(n  +  mlogm)  algorithm  for  the  case  of  uniform  processors  having 
the  same  memory  size.  The  general  problem  of  uniform  processors  with 
different  memory  sizes  has  been  considered  by  Lai  and  Sahni  [?]  and  by  Martel 
[9]. 

[l],  [2],  [8],  [11],  and  [12]  are  some  other  references  on  work  related  to  the 
scheduling  of  multi  pipelined  supercomputers. 

As  stated  above,  the  problem  we  arc  considering  in  this  paper  (l.c.,  con¬ 
struct  minimum  length  schedules)  is  NP-hard.  Hence,  it  is  extremely  unlikely 
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that  there  exist  efficient  (i.e.,  polynomial  time)  algorithms  that  solve  our  prob¬ 
lem.  We  shall  therefore  relax  the  requirement  that  the  schedules  constructed 
be  minimal  and  only  require  that  the  schedules  be  constructed  quickly  and  be 
"good''. 

Su  and  Hwang  [13]  have  developed  an  efficient  algorithm.  SU,  to  schedule  n 
tasks  on  m  identical  processors  with  the  same  memory  size.  Their  algorithm 
runs  in  O(n)  time  and  generates  solutions  that  are  quite  good.  Specifically,  if  we 
let  ta,  io0  and  •wsv  be  as  below: 

ta  =  max  }  max  $  fi+f0  yj(ft+fo)/mJ 
1  t 

tuc  -  length  of  minimum  length  schedule 

t ugrj  =  length  of  schedule  generated  by  the  Su-Hwang  algorithm 
then. 


Visy<  ta  +  (m-l)f0/3 
<  w0  +  (m-l)f0/2 

Using  algorithm  SU.  Su  and  Hwang,  further  showed  how  a  task  set  with  tree 
precedence  could  be  scheduled  such  that  the  schedule  length  was  no  more  than 

where  l  is  the  height  of  the  precedence  tree. 

In  section  3  we  shall  show  how  McNaughton’s  algorithm  for  the  case  t0  =  0 
can  be  adapted  to  get  a  fast  algorithm,  S,  to  schedule  n  independent  tasks  on  m 
identical  processors  with  identical  memory  size  such  that: 

w5<t,+  Tilt 

T71 

This  new  algorithm  may  be  used  in  place  of  algorithm  SU  to  schedule  tree 
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precedence  task  systems  in  the  algorithm  of  [12].  The  resulting  algorithm  pro¬ 
duces  better  schedules.  In  this  section,  we  also  show  that  the  worst  case  bound 
of  ta  +  (m-l)f0/m  cannot  be  improved  upon.  In  section  3,  wc  consider  identi¬ 
cal  processors  with  different  memory  size  and  finally,  in  section  4,  we  consider 
the  case  of  uniform  processors  with  the  same  memory  size. 


2.  Identical  Processors  With  The  Same  Memory  Size 

Me  Naughton's  algorithm  to  construct  a  minimum  length  schedule  for  the  case 
t o  =  0  proceeds  by  first  computing  the  schedule  length  f  as  below: 

f  =  max  [  max  [  tt  j,  £*(/  mj 
*  * 

The  n  tasks  are  now  scheduled,  in  any  order,  by  first  using  up  all  or  processor  1 
(Pi),  then  all  of  P2.  then  all  of  P3,  etc.  until  all  n  tasks  have  been  scheduled.  If 
when  scheduling  a  task  on  Pi  we  discover  that  it  cannot  complete  by  f,  then  the 
remainder  is  assigned  to  Pl+1  starting  at  0. 

When  f  o  *  0,  we  compute  u is  as  below: 

tu.  =  max  [  max  [  (t0  +  M.  (S(*o  +  f*)  +  (m-l)f0)/?n] 

*  i 


A  modified  version  of  Me  Naughton's  algorithm  is  used  to  obtain  a  schedule 
of  length  at  most  wg.  The  tasks  are  scheduled  using  algorithm  S  of  Figure  4. 

Note  that  It  Is  possible  for  algorithm  S  to  generate  schedules  that  sure 
shorter  than  wg  by  upto  (m-l)fo^m.  Theorem  1  establishes  that  algorithm  S 
always  succeeds  in  generating  a  valid  schedule. 

Theorem  1:  Algorithm  5  always  generates  a  schedule  of  length  at  most  wg. 
Proof:  We  fir  '  observe  *’  it  there  are  three  points  in  the  algorithm  where  a  task 
might  be  schh.~o/  d.  t  it  is  scheduled  at  the  point  labeled  1:,  it's  scheduling 
satisfies  criteria  1  and  2  stated  earlier  for  valid  schedules. 


>  < 
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procedure  S 

i  :=  1;  {task  number} 
j  :=  1;  {processor  number} 
q  :=  m;  {last  available  processor} 

time-remaining  {remaining  time  on  processor  j} 

fori  :=  1  to n do 

If  t0  +  ti  s  time-remaining 
then  begin 

1:  Schedule  task  i  on  processor]  for  to  +  U  time 
beginning  at  time  -uis  -  time-remaining; 
time-remaining  :=  time  remaining  -  f  0  —  tx\ 
if  timejremaining  <  to 
then  begin 
2:  j  :=  j+1; 

time-remaining  :=  uis; 
end; 
end 
else  begin 

if  ws  -  t0  -  ti^t0 
then  begin 

3:  Schedule  task  i  on  processor  q  from  0  to  1 0  +  ft ; 

4:  q:=q-  1; 

end 

else  begin 

5:  Schedule  task  i  on  processor  j  from  105  -  timejemaining 
to  ws  and  on  processor  j+1  from  0  to  2f  0  +  U  -  time-remaining; 
time-remaining  :=  w 5  +  timejremaining  -  2t0  -  ft; 

J  l+i; 

end; 


end;  {of  S} 

Figure  4  Algorithm  to  schedule  identical  processors. 

If  task  i  is  scheduled  at  point  3:,  then  to  +  t^ws  by  definition  of  ws-  Again, 
the  scheduling  of  task  i  is  done  in  a  valid  way  without  increasing  the  schedule 
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length  beyond  ws . 

Point  5:  is  the  only  place  where  a  task  may  be  scheduled  with  a  preemption. 
We  must  show  that  the  scheduling  of  the  two  subtasks  that  task  i  is  divided  imo 
does  not  overlap.  The  sum  of  the  task  times  for  the  two  sub  tasks  is  2i2  +  L. 
This  quantity  cannot  exceed  1115  because  if  it  did,  then  v. )S-  t0-  <  tc  and  task 

i  will  be  scheduled  at  point  3:.  Hence,  the  two  subtasks  of  task  1  do  not  overlap. 

Finally,  we  need  to  show  that  by  the  time  j  exceeds  q,  all  tasks  have  been 
scheduled.  If  this  is  not  the  case,  then  the  schedule  generated  either  uses  more 
than  m  processors  or  has  assigned  more  that  one  task  for  processing  in  the 
same  time  slot  on  some  of  the  processors.  Let  i'  be  the  first  value  of  i  when  an 
attempt  is  made  to  schedule  a  task  on  a  processor  that  has  already  been  used 
(this  processor  would  have  been  used  earlier  by  3:)  or  on  a  processor  with  index 
j.  j  >  m.  Suppose  that  in  the  scheduling  of  the  previous  i'-l  tasks,  j  had  been 
incremented  kx  times  at  2:  and  q  had  been  decremented  k2  times  at  4:.  This 
means  that  on  k  =  Jbj  +  fcz  processors  there  are  no  premptions. 


The  total  capacity  utilized  is  J]  (fo  +  f<)  +  pt 0.  where,  p  is  the  number  of 

1 

preemptions  introduced.  Since,  ui$ a  ^(f0  +  ft)  +•  (m-l)f0J/m,  the  idle  capa- 

1 

city  on  all  m  processors  together  must  be  at  least 

£(fo  +  ft)  +  (m-j>-l)f„a  (m-p)f0  +  ff. 

«• 


If  j  =  q  when  i  =  i‘,  then  time-remaining  on  processor  j  is  less  than  fo  +  ff 
and  the  remaining  processors  have  at  most  f0  idle  time  each.  In  fact,  the  total 
Idle  time  on  the  remaining  processors  is  no  more  than  kf  0.  Hence  the  total  idle 
time  on  the  m  processors  is  less  than  fcfo  +  fo  +  ff.  However,  the  number  of 
preemptions  in  this  case  is  m-k-1.  So,  the  remaining  capacity  must  be  at  least 
(k+l)f0  +  f«.,  a  contradictioa 

If  j  >  q  when  i  =  i’,  then  the  total  idle  time  on  the  m  processors  is  at  most 
kf o-  But,  p=m-k  and  the  available  capacity  must  be  at  least  kf 0  +  ff . 
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Hence  the  algorithm  always  generates  a  valid  schedule  with  length  at  most 
u>s  ■ 


Examining  the  definition  of  ws,  we  see  that  ws  <;  ta  +  m.  Our 

next  theorem  establishes  that  we  cannot  get  a  better  bound  on  vjs 

Theorem  2:  For  every  m,  there  exist  task  sets  for  which  the  minimum  length 
schedule  is  of  length  ta  +  (m.-l)f0/m. 

Proof:  First  consider  the  case  m  =  2,  f0  =  1,  t  j  =  fa  =  ts  =  5.  One  may  readily 
verify  that  if  no  preemptions  are  allowed,  then  there  is  no  schedule  with  length 
less  than  12.  If  one  or  more  preemptions  arc  allowed,  then  there  is  no  schedule 
with  length  less  than  fa  +  t^/2  =  9.5.  This  example  generalizes  to  the  case  of  m 
processors.  Simply  consider  m+1  tasks  of  length  5  and  f0  =  1.  ■ 

We  note  that  algorithm  S  is  substantially  simpler  than  the  algorithm  pro- 
i  posed  in  [12].  In  fact,  it  can  be  trivially  implemented  in  hardware,  thereby  vir¬ 

tually  eliminating  the  scheduling  overhead.  For  m  =  2,  the  bound  on  ws  is  the 
same  as  that  on  the  algorithm  of  [12].  For  other  values  of  m,  our  bound  is 
better  by  an  additive  amount  of  (m-l)(l/2-l/m).  Also,  our  algorithm  may  be 
substituted  into  the  algorithm  suggested  in  [12]  for  tree  precedence  tasks  The 
resulting  algorithm  will  have  an  improved  performance.  Since  the  minimum 
schedule  length,  w0,  is  at  least  ta,  we  obtain  the  relation 
Ws  ^  uio  +  (m-l)f0/  m. 

a.  Identical  Processors  With  Different  Memory  Sbe 

Our  heuristic  algorithm  for  this  case  is  based  on  the  algorithm  suggested  by 
Kafura  and  Shen  [5].  As  remarked  earlier,  this  algorithm  generates  optimal 
schedules  when  to  =  0  and  it  runs  in  O(nlogm)  time.  Assume  that  1 , 

lasKrn.  Let  Bt  -  (j  I  j,  isi<m  and  Bm  =  [ J  I  Let  Ft  =  uBJt 

bfitsm  and  let  Xi  =  Y,  tj ,  Define  f  as  below: 

&* 

t  =  max  \  max  |  ftJ,  max  j  Ai/ijj 
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The  Kafura-Shen  algorithm  generates  schedules  of  length  f  by  scheduling 
first  all  jobs  in  fij,  then  all  in  fig,  and  so  on.  When  tasks  from  fit  are  being  con¬ 
sidered,  processors  1  through  i  arc  available.  The  scheduling  is  done  using 
McMaughton’s  scheme.  It  is  not  too  difficult  to  see  that  when  tc  *  0,  this  stra¬ 
tegy  can  be  adapted  in  the  same  way  as  we  adapted  McNaughton's  algorithm  in 
section  2.  The  w s  to  use  now  is  given  below: 

u>s  =  max  (  max  |  £4  +  fo}.  maxj(K4  +  (t-l)fo)/'i{i 
i  i 

where  Yt  =  £  (*J  +  *o) 

The  correctness  of  the  scheduling  method  may  be  established  as  in  section 

2. 


4.  Uniform  Processors  With  Equal  Hemory  Size 

Assume  that  the  processors  are  ordered  by  speed.  I.e.,  s^st+i,  l£i<m.  Let  7* 
and  Si ,  ljsism  be  as  defined  below: 

Ti  =  sum  of  the  longest  i  task  times  +  i£0,  lsi<m. 

Tm  =  sum  of  the  n  task  times  +  nt0 

Si  =  tst,  l-sfasm 
S~i 

When  f o  =  0,  a  minimum  length  schedule  can  be  obtained  in  0(n  +  mlogm)  time 
[3].  The  algorithm  of  [3]  begins  by  computing  the  minimum  schedule  length,  f, 
using  the  formula: 

f  =  max{7k/Sfcj 


> 


Since  the  algorithm  of  [3]  generates  schedules  that  have  no  more  than 
2(m-l)  preemptions,  one  might  conjecture  that  in  the  face  of  overheads  of  f0  >  0 
per  preemption,  the  schedule  length  need  increase  to  no  more  than  w<j  as  given 
below: 

wy  =  max  j ( Ti  +  2(i-l)f0)/ -Stj 


Establishing  the  validity  of  the  above  conjecture  is  quite  a  bit  harder  than 
establishing  the  validity  of  the  bound  wg  for  identical  processors.  Like  the  algo¬ 
rithm  of  [3]  for  the  case  when  to  =  0,  our  algorithm  here  will  use  4  scheduling 
rules.  However,  the  condition  for  applying  each  and  the  rules  themselves  are 
somewhat  different.  The  n  tasks  shall  be  scheduled  one-by-one.  The  schedule 
for  any  given  task  will  be  obtained  by  using  exactly  one  of  the  4  rules. 

Let  us  introduce  some  terminology  first.  Processor  j  has  idle  time  if  there 
is  some  time  between  0  and  wy  during  which  no  task  has  been  assigned  to  it. 
The  interval  [a,b]  constitutes  a  Slock  of  idle  time  on  processor  j  iff  this  processor 
is  idle  throughout  this  interval.  A  block  [a,b]  of  idle  time  on  processor  j  is  a 
usable  block  iff  (b-a)sy  >  f„.  A  set  of  processors  with  nonoverlapping  usable 
blocks  is  called  a  usable  processor  system  (UPS). 

A  three  processor  system  with  idle  times  is  shown  in  Figure  5(a).  The  heavy 
lines  represent  nonusable  idle  blocks  while  the  light  lines  represent  usable 
blocks.  Note  that  there  is  no  overlap  amongst  the  usable  blocks.  This 
represents  a  UPS  even  though  some  usable  blocks  overlap  with  some  nonusable 
blocks.  A  UPS  will  be  drawn  as  In  Figure  5(b).  In  this  figure,  only  the  usable 
blocks  are  shown.  Observe  that  unlike  the  DPS  of  [3),  a  UPS  is  not  required  to 
consist  of  a  continuous  block  of  idle  time  from  0  to  w  y. 

Let  us  assume  that  •  •  ■  >tm-i>tj,  jim.  Task  i  is  the  ith  task  to  be 

scheduled.  We  shall  use  k  to  denote  the  next  task  to  be  scheduled.  Initially, 
k=l.  I(k)  will  denote  the  set  of  processors  used  in  the  scheduling  of  tasks  1,  2, 
....  k-1.  Initially,  l(k)  =  jl  j.  idle_time(k)  denotes  the  total  amount  of  processing 
capacity  available  in  the  usable  blocks  of  I(k)  (t.e.,  sum  of  the  block  length  and 
•peed  products).  NP(k)  is  the  number  of  preemptions  in  the  schedule 
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Figure  5  A  UPS 

constructed  for  tasks  1,  2 . k-1;  H(k)  is  the  number  of  usable  blocks  in  l(k); 

and  A(k)  is  the  number  of  unusable  idle  blocks  in  l(k).  Note  that  each  unusable 
block  represents  atmost  t0  units  of  processing. 

When  task  k  is  to  be  scheduled,  we  determine  which  of  conditions  Cl  -  C4 
(given  below)  holds  and  use  the  appropriate  scheduling  rule.  Informally,  these 
four  conditions  are: 

Cl:  Task  k  can  be  scheduled  on  the  usable  blocks  of  l(k)  in  such  a  way  that  no 
usable  blocks  remain. 

C2:  There  isn't  enough  usable  capacity  in  I(k)  to  complete  task  k. 

C3:  The  usable  processing  capacity  in  l(k)  is  enough  to  complete  task  k.  How¬ 
ever,  the  usable  capacity  left  following  the  scheduling  of  this  task  will 
exceed  t0. 

C4:  (k=m)  or  there  is  enough  usable  capacity  in  l(k)  as  well  as  on  each  of  the 
processors  not  in  l(k)  to  complete  task  k. 

111086  conditions  are  specified  more  formally  later.  They  are  tested  for  in 
the  order  C4.  Cl,  C2,  and  C3.  Once  C4  holds,  rule  R4  takes  over  and  schedules  all 
remaining  tasks.  For  every  k  such  that  task  k-1  is  scheduled  using  one  of  rules 
R1-R3,  the  following  will  be  true: 

1.  I(k)  is  a  UPS. 


16 
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2.  !l(k)l  =  k 

3.  NP(k)  +  H(k)  +  A(k)-ls  2(k-l). 

When  k=l,  NP(k)=0,  H(k)=l,  A(k)=0  and  we  see  that  1-3  above  are  true. 

The  tour  scheduling  rules  together  with  their  associated  conditions  are 
given  below. 

RuleRl 

Condition  Cl:  H(k)t0  +  tk  <  idle_time(k)  £  (fl(k)+l)t0  +  tk 
Task  k  is  scheduled  in  the  H(k)  usable  blocks  of  I(k).  This  scheduling  may  leave 
behind  an  unusable  block  of  size  upto  f0  (Figure  8).  Let  j  be  the  index  of  the 
fastest  processor  not  in  l(k).  Such  a  j  must  exist  as  tl(k)l  =  k  <  m.  Define  I(k+l) 
to  be  I(k)  u  j jj.  The  time  on  processor  j  from  0  to  t otJ  constitutes  the  only  usable 
block  of  I(k+1).  We  see  that  NP(k+l)  =  NP(k)  +  H(k)  -  1,  H(k+l)  =  1,  and  A(k+l) 
«A(k)  +  1.  Hence, 

NP(k+l)  +  H(k+1)  +  A(k+l)  - 1  s  NP(k)  +  H(k)  - 1  +  1  +  A(k)  +  1-1 

?  NP(k)  +  H(k)  +  A(k)  - 1  +  1 
s  2(k-l)  +  1 
<  2k. 

Also,  I(k+l)  is  a  UPS  and!l(k+l)l  =  k+1. 


Possible  unusable  block 


m  m?m 


figure  S  Scheduling  with  rule  R1 

Rule  R2 

Condition  C2:  idle_time(k)  <  +  tk 

At  this  time,  there  isn't  enough  usable  processing  capacity  In  I(k)  to  schedule 
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task  k.  Let  I(k)  =  Jl,  2 . j,  <1.  i8,  ...  4-/1,  where  j+1  X.  S  =  $it,  ig.  ...  }.  II  S  = 

then  j=k  and  idie-time(k)  i  (7*  +  2(Jb-l)t0)  -  (7*_,  +  NP(k)t0  +  A(k)t0) 
l*+fo+2(A:-l)t0+(/f(A:)-l)l0-2(A!-l)lo  =  tk+H{k)t0  (recall  k  <  m  when  rule  R2 
is  used).  This  contradicts  condition  C2.  So,  3*0.  Processors  in  S  are  intro¬ 
duced  into  I  by  rule  R3.  From  the  way  this  rule  selects  a  processor  for  inclusion 
and  the  fact  t  j  %  f8  a  •  •  •  tk,  it  follows  that  tk  +  ta  wrjsj+x. 

Let  I(k+l)  =  l(k)  u  jj+i  J.  So,  ll(k+l)l  =  k+1.  Index  the  usable  blocks  of  l(k)  1 
through  H(k).  Let  r«  denote  the  start  of  the  ith  usable  block.  Let  At  be  the  pro¬ 
cessing  capacity  of  the  ith  usable  block  (i.e.,  the  product  of  block  length  and 
processor  speed).  Assume  that  the  usable  blocks  have  been  indexed  such  that 
t*  >  t4m,  lsi<H(k)  (Figure  7).  Let  r0  =  u>y.  Find  the  least  i,  iiO.  such  that  one 
of  the  following  is  true: 


a)  (i+l)t0  +  4  ^  +  ^  (i+2)fo  +  tk 

p*\ 

b)  4  Ap  +  T<s,tl  <  (i+l)fo  +  tk 
p*  i 


0 


c)  i  —  H(k) 

TH(k)  T1+l 


mm 


mm 


mm 


mm  hu/  mm  a.  -  — - 

t  usable  block 

Possible  unusable 
block 


figure  7  Scheduling  with  rule  R2(a) 

Clearly,  such  an  i  exists.  The  scheduling  of  task  k  depends  on  which  of  the  above 
conditions  holds  tor  this  1.  If  more  than  one  of  the  above  hold  for  this  least  i, 
then  the  first  of  them  that  holds  determines  the  way  to  schedule  task  k. 


_  J 
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Case  (a)  holds 

Schedule  task  k  to  completely  use  up  the  usable  blocks  1  through  i.  Schedule 
the  remainder  of  this  task  on  processor  j+1  so  as  to  finish  at  rt  (Figure  7).  The 
remaining  usable  block  (if  any)  on  j+1  begins  at  t4  and  ends  at  wy.  If  i=0,  then 
there  is  no  usable  idle  time  left  on  processor  j+1.  If  i>0,  then  it  follows  that  the 
processing  capacity  of  j+1  from  rt  to  tuy  is  greater  than  t0.  Also,  note  that  the 
scheduling  of  task  k  might  create  an  unusable  idle  block  of  capacity  at  most  t0 
starting  at  0  on  j+1.  It  is  not  too  difficult  to  see  that  NP(k+l)  =  NP(k)  +  i, 
H(k+1)  ^  H(k)  -  i  +  1,  and  A(k+1)  <  A(k)  +  1.  Hence, 

NP(k+l)  +  H(k+l)  +  A(k+1)  -  1 
s:  NP(k)  +  i  +  H(k)  -  i  +  1  +  A(k)  +  1-1 

<  2(k-l)  +  2 
=  2k. 

We  also  note  that  I(k+1)  is  a  UPS. 

Case  (b)  holds 

Now,  1>0  as  ToSi+1  >  t k  +  1 0.  So, 

(1)  S*  +  Tt_iS^+1  >  if  0  +  f* 

s»l 

and 

(2)  i a,  +  nsJ+l  <  (i+l)f0  +  tk 

Pm  i 

Also,  observe  that: 

*■» 

as  otherwise  ease  (a)  occurs  for  1-1. 

This  time  we  assign  task  k  so  as  to  use  up  the  usable  blocks  1  through  i-1 
(Figure  8).  Let  p  be  the  end  of  the  ith  usable  block  and  let  s  =  At/ (0-t4).  Note 
that  it  is  quite  possible  that  0  <  rt_j  (of  course,  it  is  not  possible  fix’  jJ  to  be 

greater  than  Tt_i).  Let  4  =  (ife  +  t*  -  From  (1)  it  is  evident  that  4 

<  t4_i.  If  4  i  fi  -  then  schedule  the  remainder  of  task  k  on  j+1  from  0 

to  4.  If  in  addition,  4  <  fi,  then  designate  the  time  from  4  to  p  on  j+1  unusable. 
The  only  usable  block  on  j+1  begins  at  max}0,  4)  and  ends  at  wy.  We  see  that 
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when  task  k  Is  completed  in  this  way,  NP(k+l)  =  NP(k)  +  i  -  1.  H(k+l)  =  H(k)  +  1 
-  (i-l),  and  A(k+1)  s  A(k)  +  1.  So.  NP(k+l)  +  H(k+1)  +  A(k+l)  -  1 «  2(k-l)  +  2  = 
2k. 


°  TH(k)  Ti+1  Ti..  8  t±_1  t3  t2 


block 


Bgure  8  Scheduling  with  rule  R2(b) 


If  6  <  p  -  then  /3s;-+1  From  (2),  we  know  that 

(P-Tt)s  +  Tt Sjn  <  (i  +  l)t0  +  t*  -Ev  So,  there  is  a  7,  rt  <  7  <  p  such  that 

**1 

7*i+i  +  (fi-7)s  =  (i+l)t0  +  tk  -  tv  The  remainder  of  task  k  is  scheduled  on 

s»i 

j+1  from  0  to  7  and  in  the  usable  block  A<  from  7  to  p  (Figure  9).  The  idle  time 
on  j+1  from  7  to  w,j  forms  a  usable  block.  If  the  remaining  idle  capacity  in  At  is 
no  more  than  f  0,  then  an  unusable  block  is  created  here.  So.  NP(k+l)  =  NP(k)  + 
i.  and  H(k+1)  +  A(k+1)  <  H(k)  +  A(k)  - 1  +  2.  Hence,  NP(k+l)  +  H(k+l)  +  A(k+1) 
-  1  sc  2k.  I(k+1)  is  readily  seen  to  be  a  UPS. 


figure  9  Scheduling  with  rule  R2(b) 
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Case  (c)  holds 

It  is  the  case  that  t  Ap  +  t^s^i  >  (tf (fc)+l)*o  +  **•  Let  y  - 

p*  i 

((/f(fc)+l)f0  +  tk  -^^Ap)/s^+i.  Schedule  task  k  to  use  up  the  usable  blocks  of 

p*\ 

I(k)  and  on  j+1  form  0  to  7  (Figure  10).  It  is  clear  that  7  <  th^.  I(k+l)  has  only 
one  usable  block.  It  is  on  processor  j+1  from  7  to  wy.  NP(k+l)  =  NP(k)  +  H(k), 
and  A(k+l)  =  A(k).  So,  NP(k+l)  +  H(k+1)  +  A(k+1)  - 1  <  2k. 

TH(k)  T3  T2  T1  T0 


vmm 


Figure  10  Scheduling  with  rule  R2(c) 

Rule  R3 

Condition  C3:  idle_time(k)  >  (H(k)+l)fc  +  ft 

Let  q  be  the  smallest  value  of  r  such  that  r  ft  I(k)  and  f*  +  f0  >  %uysr.  Such  a  q 
must  exist  as  otherwise  C4  also  holds  and  is  given  priority  over  this  rule.  Let 
T«,  hi,  l<i«H(k)  be  as  in  rule  R2.  Let  l(k+l)  =  I(k)  u  jqj.  We  first  see  thatk(k+l)l 
=  k+l. 

Find  the  largest  i,  i<H(k),  for  which  one  of  the  following  true: 

a)  (<+l)*e  +  **  36  £  As  +  T<««  <  (4+2)f0  +  ** 

b)  t*P  +  T{Sf  <  (i+l)f0  +  tk 
Mi 

c)  <  =  0 
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Clearly,  such  an  i  exists.  The  scheduling  of  task  k  depends  on  which  of  the  above 
conditions  holds  for  this  i.  If  more  than  one  of  the  above  hold  for  this  largest  i, 
then  the  first  that  holds  determines  the  way  to  schedule  task  k. 

Case  (a)  holds 

Schedule  as  in  case  (a)  of  Rule  R2. 

Case  (b)  holds 

We  have  the  following  inequalities: 

t<+i s,  +  ft  >  (i+2)*0  +  tk 

and 

ns,  +  £  fip  <  (t+l)*0  +  ** 
v- 1 

Let  p  be  the  end  of  the  usable  block  Ai+1.  Prom  the  last  inequality  and  the  rela¬ 
tion  p*  r* ,  it  follows  that: 

Psq+£*p  <(i+2)*c  +  f* 


Hence,  there  exists  y.  Tu.l  <y<p,  such  that 
+  £  Ap  =  (i+2)to  +  ** 

p*i 

where  s  =  A^/  (p-r^i)-  Task  k  is  scheduled  on  processor  q  from  0  to  y,  on  the 
usable  block  from  y  to  p,  and  on  the  whole  of  the  usable  blocks  indexed  1 
through  1.  The  idle  time  on  processor  q  from  y  to  uiy  may  or  may  not  form  a 
usable  block.  Further,  the  capacity  left  on  A,M  may  also  be  unusable.  Regard¬ 
less  of  the  outcome  for  the  remaining  capacity  on  q  and  the  i+lth  block,  we  have 
NP(k+l)  =  NP(k)  +  i  +  1.  and  H(k+1)  +  A( k+l)  <  H(k)  +  A(k)  +  1-1.  So,  NP(k+l) 
+  H(k+1)  +  A(k+1)  - 1  <  2k. 

case  (c)  holds 

When  H(k)  =  1.  (3)  follows  from  C3.  When  H(k)  >  1.  (3)  follows  from  (a)  and  (b) 
with  i=l. 

(3)  A,  +  r,sf  >  2t0  +  ** 
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Let  0  be  the  end  of  the  interval  Aj.  From  the  choice  of  q  and  the  relation  0 
uy,  we  obtain: 


(4)  2tC  +  4  >  tZ  +  tc  >  VJrjSq  >  0s? 

From  (3)  and  (4),  it  follows  that  there  is  a  y,  Ti  <  y  <  0  such  that: 

7s7  +  (0~7)  A  i/  (0-r,)  =  2f  0  +  4 

Schedule  task  k  on  processor  q  from  0  to  7  and  on  from  7  to  0.  Hie 
remaining  idle  time  on  Ai  and  on  q  may  or  may  not  form  usable  blocks.  Regard¬ 
less  of  this,  we  have  NP(k+l)  =  NP(k)  +  1,  H(k+l)  +  A(k+l)  ^  H(k)  +  A(k)  +  1. 
So.  NP(k+l)  +  H(k+1)  +  A(k+l)  - 1  s  2k. 

Note  Before  moving  on  to  rule  R4,  we  should  observe  that  the  schedules  gen¬ 
erated  by  rules  R2  and  R3  may  in  fact  assign  task  k  for  less  than  f0  units  of  pro¬ 
cessing  on  some  processors.  This  creates  no  problem  as  the  schedule  can  be 
cleaned  up  in  the  end;  eliminating  these  assignments.  Each  such  elimination 
reduces  the  number  of  preemptions  by  1  and  increases  the  value  of  AQ  by  1.  So, 
the  sum  NP()  +  H()  +  A()  - 1  is  unchanged. 

RuleR4 

Condition  C4c  (k=m)  or  (C3  and  f0  +  4  «  wvsp  for  every  p£I(k)) 

If  k=m,  then  the  sum  of  the  processing  capacities  in  the  usable  blocks  of  I(m)  is 
at  least: 

Tm  +  2(m-l)f0  -  {NP{m)+A{m))tQ 

iml 

>Tm  +  (ff(m)-l)f0 

1 

This  is  just  enough  to  schedule  the  remaining  n-m+1  tasks  on  the  H(m)  usable 
blocks  of  I(m)  in  the  obvious  way.  At  most  H(m)-1  preemptions  will  be  intro¬ 
duced  and  we  have  the  idle  capacity  to  handle  this  many  additional  overheads. 

If  k<m,  then  let  Qt,  Qe, ...  be  the  processors  not  in  I(k).  Let  gt  denote 
the  speed  of  ft  and  assume  that  the  processors  have  been  ordered  such  that 
<  ?<+!•  i<i<ni-k.  We  now  schedule  as  many  tasks  as  p  isible  using  the 
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procedure  given  in  Figure  11.  This  scheduling  procedure  is  quite  similar  to 
McNaughton's  [  10].  We  need  to  show  that  the  preemptive  scheduling  done  here 
docs  not  cause  an  overlap.  Let  j  be  the  index  of  the  first  task  that  is  scheduled 
with  an  overlap.  Let  Qp  and  Q,+1  be  the  processors  on  which  it  is  scheduled.  Let 
A  be  the  amount  of  time  it  is  assigned  to  So,  Agp+1  {wy  -  A)g?  <  2tc  + 

Also,  there  must  be  an  r,  k^r<minjm,jj  and  a  v,  isvsp,  such  that  2 10  +■  tT  < 
tur/qv.  If  this  is  not  the  case,  then  t0  +  tt  £  t  ^  2 tQ  +  tl:  kslcminfm.jj. 

So,  tasks  k,  k+1,  ....  minjm,jj-l  are  scheduled  to  use  up  all  of 

Qi,  Qz . gmtnimj)-!  respectively.  If  j<m,  then  task  j  is  to  be  scheduled  by  the 

then  clause  of  Figure  11  and  no  preemption  occurs.  If  jam,  then  p>m*k  and 
task  j  is  not  scheduled  by  Figure  11.  So,  wc  may  assume  that  r  and  v  as 
described  above  exist.  Now,  since  tj  <  tr  and  qp+i  a  qp  a  qv,  it  must  be  that 
Zt o  +  tj  <  +  (ui rj  -  b)qp .  A  contradiction.  Hence,  no  task  is  scheduled  with 

overlap. 

Let  numpi  be  the  total  number  of  preemptions  and  idle  slots  of  size  at  most 
f0  that  are  introduced.  We  see  that  if  no  usable  block  remains  on  Qm-t,  then 
numpi  ^  m-k.  Otherwise,  numpi  as  m-k-1. 

If  j>n  when  this  procedure  terminates,  then  all  tasks  have  been  scheduled 
and  we  need  go  no  further.  If  jsn,  then  it  is  necessary  to  schedule  some  tasks  in 
the  usable  blocks  of  I(k).  If  the  idle  capacity  left  on  Qm-k  is  no  more  than  t0, 

then  the  usable  capacity  in  I(k)  is  at  least  Tm  +  2(m-l)f0  -&ft  +  to)  "  (NP(k) 

i*=i 

+  A(k)  +  numpi)  10  as  +  f0)  +  2(m-l)f0  -  \2{k-\)  +  1  -  H{k)  +  m  -  fc}f0  = 
W 

(t,  +  f  0)  +  {m-k  +H{k)-l)t0.  This  is  enough  capacity  to  process  the  remain- 
W 

ing  tasks  in  a  straightforward  way. 

The  final  case  to  consider  is  when  the  idle  capacity  left  on  Qm-k  exceeds  t0. 
Let  the  idle  time  on  Qn-*  begin  at  6  and  go  upto  uiy.  The  capacity  associated 
with  this  time  is  less  than  t0  +  tj.  If  there  Is  no  overlap  between  the  idle  time  on 
Qn-k  and  the  usable  blocks  of  I(k),  then  we  may  schedule  the  remaining  tasks 
on  the  H(k)+1  usable  blocks  in  a  straightforward  way  introducing  at  most  H(k) 
additional  preemptions  and  idle  slots  of  capacity  at  most  t0  each.  We  may  verify 
that  enough  capacity  exists  for  this.  So,  assume  that  there  is  some  overlap.  We 
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p:=l;  j:=k;  idle_time  := 

repeat 

if  tQ  +  tj  ^  idle_tims 

then  begin 

schedule  j  on  Qp ; 

idle_time  :  =  idle_time  -  t0~  tj, 

if  idleL-t.ime  <  t0 

then  begin 

p  :=  p+1; 

idle. .time  :=Wrjqp-, 

end; 

end 

else  begin 

if  p  =  m-k  then  exit; 

schedule  j  on  Qp  upto  w  j  and  on  Qp+i  beginning 
at  0.  This  requires  exactly  one  preemption 
idle-time  :=  wvqp^  +  idLeJtime  -  2t0  ~  tj\ 
p:=p+l; 
end; 
j:=j+l; 

until  j>n  or  p>m-k; 

figure  11 

have  the  situation  of  Figure  12.  For  convenience,  we  have  numbered  the  blocks 
left  to  right  in  this  figure,  r  is  the  highest  index  such  that  block  i  of  l(k)  has 
some  overlap  with  the  idle  time  on  Qn-t  Clearly,  rfel.  Let  the  capacity  of  the 
1th  block  be  and  let  s  =  gm_*. 


Lb  +  (u>y-A)s  i  rt0  +  tj,  then  schedule  task  j  to  use  up  all  of  and 

l 

as  much  of  Aj.  Ag . A,..,  as  needed  to  complete  task  j.  One  may  easily  show 

that  there  is  enough  capacity  left  to  complete  the  remaining  tasks  by  schedul- 


them  as  for  the  case  when  k=m. 
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Figure  12 


If  Tt^P  +  A'  +  (toy— S)s  fe  (r+l)i0  +•  tj,  where  A'  is  the  capacity  available  in 
i 

the  rth  block- from  rr  upto  6,  then  again  schedule  task  j  to  use  up  all  of  all 

of  A t,  l^i<r,  and  the  appropriate  needed  fraction  of  A^  Once  again,  we  may  ver¬ 
ify  that  there  is  enough  remaining  capacity  in  I(k)  to  complete  the  remaining 
tasks  by  scheduling  them  as  we  did  for  the  case  k=m. 

Otherwise,  from  C4  and  k<m,  we  see  that  there  is  an  fer  for  which 

£  Ap  +  (uiy-ft)s  a  (i+l)t0  +  tj,  (ft  is  the  end  of  the  ith  usable  block).  Find  the 
*>=i 

least  i  for  which  this  is  true.  It  follows  that  +  (lUy-T^S  <  (l  +  l)f0  +  tj. 

p-\ 

Hence,  there  is  a  y,  t*  <  y  s  ft ,  such  that  task  j  can  be  completed  by  scheduling 
it  on  all  of  Ap.  lsfip<i,  on  the  ith  usable  block  from  r4  to  7,  and  on  from  7  to 
ujy,  One  may  verify  that  the  remaining  capacity  is  enough  to  complete  the 
remaining  tasks.  Since  all  remaining  usable  blocks  are  nonoverlapping,  the 
remaining  tasks  are  easily  scheduled. 

Complexity 

The  scheduling  algorithm  described  above  can  be  implemented  in  O(n+mlogm) 
time,  mlogm  time  is  needed  to  order  the  processors  by  speed  and  n+mlogm 
time  is  needed  to  obtain  the  m  longest  tasks  in  sorted  order. 
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5.  Conclusions 

We  have  shown  that  it  is  possible  to  efficiently  generate  "good"  schedules  for 
various  systems  of  processors  in  the  face  of  preemptive  overheads.  For  the  case 
of  identical  processors  with  or  without  different  memory  size  the  schedules  gen¬ 
erated  are  within  (m-l)fo/ni  of  the  optimal  schedules.  When  processors  have 
different  speeds  but  equal  memory  size  the  schedules  generated  by  our  algo¬ 
rithm  are  within  max {2(i  —  1)t0/  -S* J  of  the  optima)  schedule  length.  Our  result 

for  identical  processors  represents  an  improvement  over  the  results  obtained  in 
[12]- 
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